Direct Policy Optimization using Deterministic Sampling and Collocation

نویسندگان

چکیده

We present an approach for approximately solving discrete-time stochastic optimal-control problems by combining direct trajectory optimization, deterministic sampling, and policy optimization. Our feedback motion-planning algorithm uses a quasi-Newton method to simultaneously optimize reference trajectory, set of deterministically chosen sample trajectories, parameterized policy. demonstrate that this exactly recovers LQR policies in the case linear dynamics, quadratic objective, Gaussian disturbances. also on several nonlinear, underactuated robotic systems highlight its performance ability handle control limits, safely avoid obstacles, generate robust plans presence unmodeled dynamics.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Direct, Stochastic Analogues to Deterministic Optimization Methods Using Statistical Filters

Stochastic optimization—those problems that involve random variables—is a fundamental challenge in many disciplines. Unfortunately, current solvers for stochastic optimization restrictively require finiteness by either replacing the original problem with a sample average surrogate, or by having complete knowledge of a finite population. To help alleviate this restriction, we state a general, no...

متن کامل

Deterministic secure direct communication using entanglement.

A novel secure communication protocol is presented, based on an entangled pair of qubits and allowing asymptotically secure key distribution and quasisecure direct communication. Since the information is transferred in a deterministic manner, no qubits have to be discarded. The transmission of information is instantaneous, i.e., the information can be decoded during the transmission. The securi...

متن کامل

Deterministic Secure Direct Communication Using Mixed State

We show an improved ping − pong protocol which is based on the protocol showed by Kim Bostrom and Timo Felbinger [Phys. Rev. Lett. 89, 187902 (2002)]. We show that our protocol is asymptotically secure key distribution and quasisecure direct communication using a mixed state. This protocol can be carried out with great efficiency and speed with today’s technology, e.g. single photon source and ...

متن کامل

Solving deterministic policy (PO)MDPs using Expectation-Maximisation and Antifreeze

The viewpoint of solving Markov Decision Processes and their partially observable extension refers to finding policies that maximise the expected reward. We follow the rephrasing of this problem as learning in a related probabilistic model. Our trans-dimensional distribution formulation obtains equivalent results to previous work in the infinite horizon case and also rigorously handles the fini...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE robotics and automation letters

سال: 2021

ISSN: ['2377-3766']

DOI: https://doi.org/10.1109/lra.2021.3068890